Skip to content

Conversation

@edeno
Copy link
Contributor

@edeno edeno commented Oct 23, 2025

Summary

Implements a three-layer defense system to ensure safe, regression-free scientific development with Claude Code:

  • Layer 1 (Hooks): Automatic enforcement of environment and testing requirements
  • Layer 2 (Skills): Workflow guidance for TDD, numerical validation, and safe refactoring
  • Layer 3 (Documentation): Enhanced CLAUDE.md with operational rules and decision trees

Key Features

Environment Consistency: Conda environment checking for all Python commands
Regression Prevention: Snapshot change detection with approval gates, test-before-commit reminders
Numerical Accuracy: Tolerance specifications (1e-14 for refactoring, 1e-10 for algorithms), mathematical invariant verification
Guided Autonomy: Claude runs tests/validates automatically, asks permission for commits/snapshot updates
Pragmatic TDD: Test-first for new features, test-verify for simple bugs
JAX Integration: Special handling for JAX code optimization and validation

Components Delivered

Hooks (4 files)

  • pre-tool-use.sh - Environment validation, test reminders, snapshot protection
  • user-prompt-submit.sh - Snapshot change detection
  • lib/env_check.sh - Conda environment utilities
  • lib/numerical_validation.sh - Validation and approval utilities

Skills (3 files)

  • scientific-tdd - Pragmatic test-driven development workflow
  • numerical-validation - Comprehensive numerical correctness verification
  • safe-refactoring - Zero-tolerance behavior-preserving refactoring

Documentation (5 files)

  • Enhanced CLAUDE.md with critical operational rules, numerical standards, workflow guide
  • SAFE_DEVELOPMENT_GUIDE.md - Comprehensive user guide
  • SYSTEM_VERIFICATION.md - Complete verification report
  • Skills and Hooks README files

Testing (1 file)

  • test_safe_dev_system.sh - Integration test (24/24 checks passing)

Test Plan

  • All 24 integration tests passing
  • Hooks execute correctly (environment check, snapshot protection)
  • Skills have proper XML headers
  • Documentation complete and accurate
  • Git worktrees properly ignored
  • No regressions in existing functionality

Verification

Run the integration test:

./tests/test_safe_dev_system.sh

Expected: All tests pass (24/24)

Usage

After merging, Claude Code will automatically:

  1. Follow operational rules from enhanced CLAUDE.md
  2. Use appropriate skills for different task types
  3. Enforce environment requirements via hooks
  4. Require approval for snapshot updates and commits

See docs/SAFE_DEVELOPMENT_GUIDE.md for complete usage instructions.

🤖 Generated with Claude Code

edeno and others added 12 commits October 23, 2025 17:59
- env_check.sh: Conda environment detection and validation
- numerical_validation.sh: Snapshot and invariant checking utilities
- Fix .gitignore to allow .claude/hooks/lib/ (was blocked by /lib/ pattern)

Part of safe scientific development system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Warns when Python commands run outside conda environment
- Reminds to run tests before commits
- Blocks snapshot updates without approval

Part of safe scientific development system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The find command on line 42 was looking for files with a "*.pytest_cache"
pattern, which doesn't exist. The .pytest_cache directory contains files
with various names (like .gitignore, CACHEDIR.TAG, README.md, v/cache/, etc.),
not files ending in .pytest_cache.

Changed from:
  find .pytest_cache -name "*.pytest_cache" -mmin -5

To:
  find .pytest_cache -type f -mmin -5

This correctly finds any files in the .pytest_cache directory that were
modified within the last 5 minutes, properly detecting recent test runs.

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Detects when snapshot files have changed
- Reminds Claude to provide full analysis before updates

Part of safe scientific development system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Pragmatic TDD workflow for scientific code with numerical validation.

Part of safe scientific development system.
Comprehensive numerical validation workflow with tolerance guidelines.

Part of safe scientific development system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Zero-tolerance workflow for behavior-preserving refactoring.

Part of safe scientific development system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…and workflow guide

Add three critical sections to CLAUDE.md for safe scientific development:

**Task 7 - Critical Operational Rules:**
- Mandatory skills usage (scientific-tdd, numerical-validation, safe-refactoring, jax)
- Environment enforcement rules (conda activation required)
- Guided autonomy boundaries (what Claude can do vs. must ask permission)
- Snapshot update approval process with required 4-part analysis format

**Task 8 - Numerical Accuracy Standards:**
- When numerical validation is required (which files/components)
- Tolerance specifications table (1e-14 for refactoring, 1e-10 for algorithms)
- Mathematical invariants that must always hold (probabilities, stochastic matrices, etc.)
- Validation commands (property tests, golden regression, snapshots)

**Task 9 - Workflow Selection Guide:**
- Decision tree for selecting appropriate workflow/skill
- Task-based guidance (new features, bugs, refactoring, JAX, etc.)
- JAX code requirements and best practices
- JAX-specific validation checklist

These enhancements provide Claude with clear operational guidelines, numerical
accuracy requirements, and workflow selection criteria for scientific development.

Before: 132 lines
After: 357 lines
Added: 225 lines

Part of safe scientific development system implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Task 10: Create .claude/skills/README.md
- Overview of all 3 skills (scientific-tdd, numerical-validation, safe-refactoring)
- Usage guidance for each skill
- Workflow summaries
- Integration and maintenance information

Task 11: Create .claude/hooks/README.md
- Overview of both hooks (pre-tool-use.sh, user-prompt-submit.sh)
- Utility library documentation (env_check.sh, numerical_validation.sh)
- Testing procedures and expected behavior
- Debugging guidance
- Integration with skills explanation

Both READMEs provide clear documentation of the safe scientific
development system components.

Part of safe scientific development system implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Task 12: Integration test script (tests/test_safe_dev_system.sh)
- Tests all 8 categories: hook utilities, hooks, skills, CLAUDE.md,
  functional tests, snapshot protection, git config, documentation
- Validates complete system installation and functionality
- All tests passing

Task 13: Comprehensive user guide (docs/SAFE_DEVELOPMENT_GUIDE.md)
- Quick start for users and Claude
- Three-layer system explanation
- Common workflow examples with checkpoints
- Approval gate processes with checklists
- Numerical tolerance guidelines
- Troubleshooting guide
- Customization instructions
- Best practices and FAQ

Both are final documentation/testing deliverables for the safe
scientific development system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Complete verification of all components, functional testing results,
and maintenance procedures.

Final deliverable for safe scientific development system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add proper XML metadata headers to scientific-tdd skill
- Add proper XML metadata headers to numerical-validation skill
- Add proper XML metadata headers to safe-refactoring skill

Headers follow Claude Code skill format with name, description, tags, and version.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@edeno edeno requested a review from Copilot October 23, 2025 22:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a comprehensive three-layer defense system to ensure safe, regression-free scientific development with Claude Code. The system combines automatic enforcement through hooks, workflow guidance through skills, and enhanced documentation to prevent numerical regressions and maintain code quality.

Key changes include:

  • Layer 1: Hooks that automatically enforce environment requirements, test reminders, and snapshot protection
  • Layer 2: Three skills (scientific-tdd, numerical-validation, safe-refactoring) providing structured workflows for different development scenarios
  • Layer 3: Enhanced CLAUDE.md with operational rules, numerical standards, and workflow decision trees

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_safe_dev_system.sh Integration test suite validating all system components
docs/SYSTEM_VERIFICATION.md Comprehensive verification report documenting system status and capabilities
docs/SAFE_DEVELOPMENT_GUIDE.md User guide explaining workflows, approval gates, and troubleshooting
CLAUDE.md Enhanced with critical operational rules, numerical standards, and workflow selection guide
.claude/skills/scientific-tdd/skill.md Pragmatic TDD workflow for new features with numerical validation
.claude/skills/safe-refactoring/skill.md Zero-tolerance refactoring workflow ensuring exact behavioral match
.claude/skills/numerical-validation/skill.md Comprehensive numerical correctness verification workflow
.claude/skills/README.md Overview of available skills and their usage patterns
.claude/hooks/user-prompt-submit.sh Post-prompt hook detecting snapshot changes
.claude/hooks/pre-tool-use.sh Pre-execution hook enforcing environment and snapshot requirements
.claude/hooks/lib/numerical_validation.sh Utilities for snapshot detection and approval management
.claude/hooks/lib/env_check.sh Utilities for conda environment validation
.claude/hooks/README.md Comprehensive hook documentation and debugging guide

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

edeno and others added 4 commits October 23, 2025 18:53
- Fix import sorting (ruff I001)
- Remove unused variable assignment (ruff F841)
- Remove trailing whitespace (ruff W291)
- Remove whitespace from blank lines (ruff W293)
- Remove unused import (ruff F401)

All auto-fixed with ruff check --fix and ruff format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Black and ruff have slightly different formatting preferences.
Applying black formatting to match CI requirements.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Remove black formatting check from CI workflow
- Use ruff format exclusively for code formatting
- Reformat all files with ruff format

Ruff provides equivalent formatting to black with additional
linting capabilities, simplifying the toolchain.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Previous commit included the workflow file but the edit didn't apply correctly.
This commit properly removes the black formatting check step.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@edeno edeno merged commit e930700 into main Oct 24, 2025
4 of 7 checks passed
@edeno edeno deleted the feature/safe-scientific-dev branch October 24, 2025 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants